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Abstract. We present a new time-stepping algorithm for nonlinear PDEs 
that exhibit scale separation in time of a highly oscillatory nature. The al- 
gorithm combines the parareal method — a parallel-in-time scheme introduced 
in |24) — with techniques from the Heterogeneous Multiscale Method (HMM) 
(cf . | 13l [2| ) , which make use of the slow asymptotic structure of the equations 
[32| . |27j . |28j . |36| . By numerically computing a locally asymptotic solution 
we are able to "factor out" the fast oscillatory part of the solution, and solve 
for the remaining slow part of the solution using time steps that are orders 
of magnitude larger than standard time-stepping methods allow. The scheme 
has two elements. First, we use HMM to numerically advance the asymptotic 
form of the equations with a large time step AT. Second, we refine the solu- 
tion in parallel on the subintervals [nAT, (n + 1) AT] using small time steps 
At and the iterative scheme scheme in |24| : the intermediate solutions on the 
sub-intervals [nAT, (n -|- 1) AT] are ephemeral, and are used only to converge 
the overall solution at the large time steps nAT. Using the asymptotic struc- 
ture allows the computed solutions to be close enough to the actual solution 
that parallel-in-time methods converge to high accuracy and with significant 
parallel speedup. 

We present error bounds, based on the analysis in [17 , that demonstrate 
convergence of the method. A complexity analysis also demonstrates that 
the parallel speedup increases arbitrarily with greater scale separation. Fi- 
nally, we demonstrate the accuracy and efhciency of the method on the (one- 
dimensional) rotating shallow water equations, which is a standard test prob- 
lem for new algorithms in geophysical fluid problems. Compared to exponential 
integrators such as ETDRK4 and Strang splitting — which solve the stiff oscil- 
latory part exactly — we find that we can use coarse time steps AT that are 
orders of magnitude larger (for a comparable accuracy) , yielding an estimated 
parallel speedup of approximately 100 for physically realistic parameter val- 
ues. For the (one-dimensional) shallow water equations, we also show that the 
estimated parallel speedup of this "asymptotic parareal method" is more than 
a factor of 10 greater than the speedup obtained from the standard parareal 
method. 



1. Introduction 

We present a new algoritlini to integrate nonlinear PDEs that exhibit scale sep- 
aration in time. We focus on time scale separation of a highly oscillatory nature, 
where standard (explicit or implicit) time-stepping methods often require time steps 
that are on the order of the fastest oscillation to achieve accuracy. This type of 
equation arises in numerous scientific applications, including the large-scale simu- 
lations of the ocean and atmosphere that serve as the primary motivation of this 
paper [HIS]. 

1 
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In particular, we consider computing solutions to equations of the form 
(1.1) ^ + ^/:u = A/-(u)+I?u, u(0)=uo, 

where the linear operator £ has pure imaginary eigenvalues, the nonlinear term 
N (u) is of polynomial type, the operator V encodes some form of dissipation, 
and e is a small non-dimensional parameter. For notational simplicity, we let u (t) 
denote the spatial (vector-valued) function u(i, •) — {ui (t, ■) ,U2 (t, ■) , . . .). The 
operator €~^C results in temporal oscillations on an order O (e) time scale, and 
generally necessitates small time steps if standard numerical integrators are used. 



Our approach for integrating (1.1 1 uses a variant of the parareal algorithm |24| . 
which is a parallel-in-time method that relies on a cheap coarse solver for computing 
in serial a solution with low accuracy, and a more expensive fine solver for itera- 
tively refining the solutions in parallel. The key novelty in this paper is to replace 



the numerical coarse solution of the full equations (1.1 1 with a locally asymptotic 



approximation of (1.1). This enables us to effectively bypass the Nyquist constraint 
imposed by the fastest oscillations, and acheive much greater parallel speedup. Ex- 
amples on the rotating shallow water equations — a standard benchmark against 
which to test new algorithms in geophysical fluid applications — demonstrate that 
this approach holds promise for increasing the accuracy and speed of geophysical 



fluid simulations. In fact, we flnd (see Section 6.2 1 that this approach allows us to 
take step sizes AT 3> e that are significantly larger than if alternative schemes are 
used for the coarse solver, including exponential integrators and split-step methods 
(which solve the stiff linear terms exactly) . In the context of large-scale simulations 
of the ocean and atmosphere, the gains achieved from spatial parallelization alone 
are beginning to saturate, and the results in this paper are a preliminary effort 
toward achieving greater efficiency. 

We first describe the standard parareal method in more detail, and some of the 



challenges for acheiving high parallel speedup for problems of the form (1.1 1. The 



basic approach of the parareal method is to take large time steps AT in serial using 



a coarse integrator of ( 1 . 1 1 , and to iteratively refine the solutions in parallel using 
small time steps At and a more accurate integrator. This can result in significant 
speedup in real (wall-clock) time if the parareal iterations converge rapidly, and 
either the ratio AT/ At of coarse and fine step sizes is large, or the cost of the 
coarse solver is much cheaper than that of the fine solver (see Section [s] for more 
details). Early applications of the parareal method include simulations of molecular 
dynamics j4], the Navier Stokes equation jl5| , and quantum control problems [26]; 
additional references can be found in [34 . Although the parareal method has been 
most widely used for parabolic-type PDEs, it has also been analyzed and used for 
accurate simulation of first and and second order hyperbolic systems (cf . [T3] , [TB] , 
and |TU]). A recent variant of the parareal method also allows for the accurate long- 
time evolution of Hamiltonian systems ^23] . Finally, general convergence results for 
the parareal algorithm can be found in |17l [5] ( |17j also numerically demonstrates 
convergence on the Lorenz equations, which is of particular relevance to geophysical 
fiuid problems). 

Despite the many successes of the parareal method, a basic obstacle remains 



for equations of the form (1.1): namely, the step size AT for a coarse integrator 
that is based on a standard method generally must satisfy AT = O (e) in order to 
achieve any accuracy at all (which is a prerequisite for convergence of the parareal 
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method). In practice, this can mean that the coarse integrator in the parareal 



method must use very small time steps for solving (1.1), and the parallel speedup 
can be minimal. There are, however, some types of highly oscillatory PDEs where 
numerical integrators have been developed that can take much larger time steps 
AT> e (cf. dg). 

In this paper, we use a numerically computed 'locally slow' solution that is 



based on the underlying asymptotic structure of (1.1), and which can allow step 
sizes AT ^ e significantly larger than the Nyquist constraint imposed by the O (e) 
temporal oscillations (and thus a potentially significant parallel speedup). A basic 
observation behind efficiently constructing a slow solution is that the solution u (t) 



to (1.1 1 has the asymptotic approximation u{t) = cxp (— i/e£) u (i) + O (e) (cf. 
|27|. psj. |36j). where the slowly varying function u(t) satisfies a reduced equation 
of the form 

(1.2) ^^JV{u)+Vu, u(0) = uo. 
Here the nonlinear term N (u) is given by the time average 

(1.3) Ar(u(t))= lim ^ r e^^N{e-^^n{t))ds. 

We emphasize that the above time averaging is performed with u [t) held fixed. 
Similarly, 

- 1 

(1.4) X>u(t)= lim -/ (e'^Pe-"^) u(t)ds. 

Note that u(<:), and its time derivatives, are formally bounded independently of 



e, and thus significantly larger time steps AT e can be taken to evolve (1.2 1. 
Section |4] also discusses how a numerical integrator based on a finite version of the 
time averages pTsI and (jOj) can be interpreted as a smoothed type of integrating 



factor method, and allows accuracy even when there is no scale sepatation in time 
(i.e. e — 0(1)); this is useful when the scale separation is localized in space, 
and where it is desirable to have a time step that is constrained only by the slow 
dynamics. 

Despite the many successes of the above averaging procedure in elucidating im- 
portant qualitative features (see e.g. [2S] and [35] for geophysical fiuid dynamics 
applications) , in practice this approach may not be accurate enough for moderately 
small values of e (e.g. e — 10~^ is typical in geophysical fluid applications), where 
the implicit constant hidden in the O (e) notation can be significant [33]. Since 
the parameter e is typically fixed in idealized applications, the resulting asymptotic 
approximation cannot be refined without some additional approach. Another limi- 



tation is that the asymptotic approximation ( 1.2 1 is generally only valid on an O (1) 
time interval, and this situation is usually not improved by adding more terms in 
the asymptotic expansion. For many applications, it is therefore necessary to refine 



this approach in order to approximate (1.1 1 with a given target accuracy and on 
longer time intervals. 



Our approach for computing the asymptotic approximation (1.2 1 — for the pur- 
pose of constructing a slow solution — is based on evaluating the time averages 
A/'(u(i)) and 'Du{t) numerically; this approach has also been used in |6j to solve 
Hamiltonian systems more efficiently (see also [2S] and [TS] for related approaches 
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in geophysical simulations). More generally, our numerical scheme for (1.2) is an 
instance of the Heterogeneous Multiscale Method (HMM) (cf. |13l [5J [T] for selective 
applications to highly oscillatory problems), which is a very general framework for 
efficiently computing approximations to problems that exhibit multiple spatial or 
temporal scales; a review of HMM can be found in [35J. The basic idea is that, by 
integrating in time against a carefully chosen smooth kernel, the time average can 
be performed over a window of length Tq — Tq (e) ^ 1/e; therefore, the overall cost 
of evolving (1.2 1 is asymptotically smaller than the cost of computing ( |1.1[ ) directly, 
and can lead to arbitrarily large efficiency gains. In problems arising in geophysical 
fluid applications, the value e may only be moderately small (e.g. e « 10^^), and 
in such cases numerically computing the average (1.3 1 can be as costly as explicitly 
integrating the full equation (1.1 1. However, the numerical average can itself be 
performed in an embarassingly parallel manner (see Section [2| , and thus is not 
expected to impact the overall (wall-clock) speed of the algorithm. Finally, we re- 
mark that for solutions that develop sharp gradients, it may be necessary to use the 
modified version of the parareal method explored in [10 . However, this is beyond 
the scope of this paper. 

The idea of using a coarse solution based on a modified equation is not new, 
and the possibility has been mentioned early on in the parareal literature (cf. |34|). 
In pS] and [T5], multi-scale versions of the parareal method are developed for, 
respectively, deterministic and stochastic chemical kinetic simulations; in ^2], the 
coarse solver is based on a deterministic (macroscopic) approximation. A recent 
paper [25] applies a version of the parareal method to systems of ODEs that exhibit 
fast and slow components. The multi-scale coarse solver in [22 uses a projection 
onto the slow, low-dimensional manifold, and examples are provided for singularly 
perturbed ODEs with dissipative-type scale separation. In contrast to [5S], |12j . 
and [22[ , here we investigate this procedure for a model nonlinear PDE whose scale 
separation is of a highly oscillatory nature, and where methods that work well for 
stiff dissipative problems (e.g. implicit or exponential integrators) generally fail to 
impart significant speedup. Moreover, the asymptotic approximation (1.2 1 to (1.1) 
cannot be computed explicitly in most cases, and this necessitates using additional 
techniques. The locally asymptotic solver developed here works even when there is 
no scale separation, which is an important feature when the time scale separation 
is a function of space and time (as occurs in some geophysical fiuid applications); 
in this case, the time step in the coarse, asymptotic solution is only constrained by 
the slow dynamics. 

In Section[2] we present aversion of the Heterogeneous Multiscale Method that is 
appropriate for efficiently computing the asymptotic approximation (1.2). We then 
present in Sections [3] a variant of the parareal method that is based on replacing the 
coarse solver with a locally asymptotic approximation. Section [4] shows that, by av- 
eraging over a scale on which the slow dynamics is occuring, this HMM-type coarse 
solution is able to achieve accuracy even when there is no scale separation. We 
provide complexity bounds for this algorithm, which demonstrate that the parallel 
speedup increases arbitrarily as e decreases. We also present error bounds that are 
based on the analysis in [17 (see also [5 ), and that demonstrate convergence of the 
method under reasonable assumptions. Finally, Section [6 . 2 1 discusses some numer- 
ical experiments on the (one-dimensional) rotating shallow water equations, which 
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serve as a standard first test for new numerical algorithms in geophysical applica- 
tions. Our experiments show that, in contrast to standard versions of the parareal 
algorithm, the algorithm can converge to high accuracy in few iterations even when 
large time steps AT ^ e are taken for the coarse solution. In fact, compared to the 
exponential time differencing method (ETDRK4), the integrating factor method, 
and Strang splitting (cf. [F, [2(T, and [2P) — all of which integrate the stiff linear 
term iT^ L exactly — our algorithm yields an estimated parallel speedup of « 100 
for the physically realistic value of e = 10^^. We also show that, for e = 10^^, this 
parallel speedup is at least 10 times greater than the speedup that can be achieved 
by using the standard parareal method with ETDRK4, OIFS, or Strang splitting 
as the coarse solver. Finally, we demonstrate that the asymptotic parareal method 
yields high accuracy even when e = 1 (i.e. in the absense of scale separation), and 
with a parallel speedup that is comparable to using the standard parareal method 
with ETDRK4, OIFS, or Strang splitting as the coarse solver. Using a coarser spa- 
tial discretization in the asymptotic solver may also result in even greater efficiency 
gains. 



2. An ASYMPTOTIC SLOW SOLUTION 



We use the Heterogeneous Multiscale Method (HMM) to solve (1.2), which relies 
on computing the averages (1.3l and (1.4) numerically (see also [B]). The key idea 



is that, by averaging in time with respect to an appropriate smooth kernel and over 
a carefully selected window length To = Tq (e) , the cost is asymptotically smaller 
than 1/e (which is the cost of solving the full equation (l.ll on an O (1) time 
interval). Once such time averages can be computed, then large step sizes AT ^ e, 
coupled with a standard numerical integrator, can be taken to evolve (1.2). For 



simplicity, we restrict our discussion to computing the time average (1.3) (in fact 



for the equati ons we consider here, the operators C and 2? commute, and so the 
average T) in (1.4) satisfies T) = V). 



The basic approach for computing the time average (1.3) involves the following 
approximations : 



N{xx(t)) 



(2.1) 



lim i f e^^Af{e-^^u{t))ds 

T-i-oo J Jq 

rTn . .. . 



M-l 



-E 



M ^ ' \ To 

m=0 ^ " 

The smooth kernel p (s), < s < 1, is chosen so that the length To ~ Tq (e) of the 
time window over which the averaging is done is as small as possible, and that the 
error introduced by using the trapezoidal rule is negligible (see e.g. jT3] and |TT| 
for an error analysis). 

More formally, we define the finite time average 



(2.2) 



A^P,To(u(t)) 



To 
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Then we need to choose the kernel p (s) and the parameters Tq = To (e) and M so 
that the truncation error, 



\W{n{t))-Up,To{nm 



and the discretization error, 



M-l 



m=0 



Til 



M (e 



U{t) 



are smaller than our desired approximation tolerance. This can be accomplished 
by requiring that the kernel p{s) satisfies p^™^ (0) = p^™^ (1) =0, m — 0,1, . . . It 
will also be convenient for comparison with Section |4] to change variables s s/e 
in the integrand in (2.2 1 to obtain the equivalent form 



A^P,To(u(t)) 



1 

eTo Jo 



P 



S 



-s/cC 



u (t) ] ds 



To better understand the role of p (s) and the parameters Tq and M , we in- 
formally analyze the above averaging procedure in more detail (6.11. First, since 
we are assuming that the nonlinear operator A/" in ( 1.3 1 is of polynomial type and 
C has pure imaginary eigenvalues, we can (in principle) expand u (t) in terms of 
eigenfunctions of C and express the nonlinear term in the form 



e'^N'{e-'^u{t)) = ^e'^"W„(u(t)) 



(2.3) 



J2 ^fnin{t))+ J2 e^'"W„(u(i)) 

A„#0 



A„=0 



Here the pure imaginary numbers jA„ are linear combinations of the eigenvalues 



of £, and the set A„ = corresponds to resonant interactions (see Section 6.1 



for a concrete example in the context of the rotating shallow water equations). 
In fact, using the definition of the averaging J\f{u{t)) operator and the above 
decomposition, we see that 



AA(u(t))= ^ AA„(u(t)) 



A„=0 



To compare this to the finite time average JJ'p^To (u {t)), use (2.3 1 to express this as 



A^P,To(u(t)) = 



To Jo 

= E 



Mi 



u{t)) ds 



p{s) e'^"'°'ds 



) (U 



Comparing 'N'p^To (u (t)) and Af (u (t)), we therefore require that 



(2.4) 



p{s) e'^"'°'ds 



1, for A„ = 0, 
0, for A„ ^ 0. 



In order to satisfy (2.4) with a time window length To small as possible, 
we choose a smooth kernel p{s) that satisfies p^™) (0) = p'™^ (1) — 0. Repeated 
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integration by parts shows that 

p (s) e'^-^°'ds 



< Cm lAnTol 



TO = 1,2, 



In particular, the above calculation indicates that choosing a time window 1 <C 
Tq (e) <C 1/e can formally yield an error 

||Ar(u(i))-Arp,To(uW)|| -0(e™), TO>1. 

Moreover, repeated integration by parts, coupled with the Euler-Maclaurin formula, 
also shows that the error induced from the trapezoidal rule is similarly small. 



m=0 ^ " 



U{t) 



= 0(6"), TO>1. 



One commonly used choice of kernel function is given by 

'Cexp(-l/(s(l-s))), 0<s<l, 



0, 



|s| > 1, 



where the constant C is such so that = 1. 



3. The asymptotic Parareal Method 

We briefly review the parareal algorithm, in the context of replacing the coarse 
solver with a numerically computed locally asymptotic solution based on the as- 
ymptotic structure of the equations described by [27j, [28^, 1^ . 



We suppose that we are interested in solving ( 1.1 ) on the time interval t e [0, 1]. 
We let (fit (uq) denote the evolution operator associated with ( |1.1| , so that u {t) = 
ipt (uq) solves (1.1 1. In a similar way, we let ip^ (u p) d enote the evolution operator 
associated with the approximation obtained from (1.2 1, so that u{t) — e!'l^'~Tf>^ (uq) 
is the solution to (1.2 1 and ipt (uq) — (uq) — O (e). 

To describe the parareal method, we first divide the time interval [0, T] into N 
subintervals [nAT, [n + 1)AT], n = . . . ,N — 1. Starting with the identity 

U„ = TpAT (U„-i) + (<PAT (U„_i) - (U„-l)) , U„ = u {nAT) , 

the parareal method computes approximations U^^ w U„ by the iterative procedure: 

\5i = <^AT (U,ti) + (^AT i^tX) - Wat , fc - 1, 2, . . . 

At iteration level fc = 0, the slow approximation — Wat i^n-i) used. Notice 
that, at iteration level k, the quantities U^Z^ in the difference ipAT (Uj^Zi) — 
Wat i^n-i) already computed; consequently, the difference if at (^n-i) ^ 
Wat {^n-i) can be computed in parallel for each n. Since the computation of 
Wat (U^_i) is inexpensive, the overall algorithm is also inexpensive in a parallel 
environment if the iterates converge rapidly. This parareal method is illustrated 
in figure |3.1[ We note that the parareal method can be interpreted as an inexact 
Newton- type iteration (cf. |17|V 

Full pseudocode for the asymptotic parareal method is presented below. In 
the pseudocode, we use the mid-point rule in the HMM-type scheme for the slow 
integrator, and Strang splitting for the fine integrator. We assume that ipAT (uq) is 
computed using M small time steps At (so that AT = MAt), and that ^at ('^o) 
is computed using one big time step AT. 
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Figure 3.1. This figure illustrates the asymptotic parareal algo- 
rithm. The verticl axes represents a typical prognostic variable 
such as h, the thickness of the layer of fluid in the shallow water 
system. The pink line depicts the asymptotic solution at the large 
time steps nAT. The blue line depicts the parallel-in-time, fine 
scale corrections, using small time steps At ^ AT. Finally, the 
black line depicts the updated solution at the large time steps nAT. 




Algorithm 1 Evaluate time average (in parallel) 



AA(uo): 

par for j ^ 1, . . . , M — I: 

Sm = TqUi/M 

end psarfor 

Ui ^ Sum(ui,...,UA/) 



Algorithm 2 asymptotic slow solver 



Coarse_Solver(uo, AT): 
Take a AT/2 timestep for the linear dissipative term: 

Take a AT timestep for the averaged nonlinear term: 

V ^ Ar(v), 



T7/ AT 

V -s- A/ uo + — -V 



2 

Take a AT/2 timestep for the linear dissipative term: 
Trcinsform back to the fast time coordinate: 



Return Ui 
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Algorithm 3 Fine solver 



Fine_Solver(uo, At, AT): 

M = AT/ At 

for m = 1,...,M: 

tcLke At/2 timestep for the linear term: 

tcLke a At timestep for the nonlinear: 

V ^ AA(v), 

V ^ TV I u„j + —V 
tcLke At/2 timestep for the linear term: 

end for 
Return Um- 



Algorithm 4 Parallel-in-time integrator 



Compute the initial guess using the slow solver: 
Ug'd ^ uo 

for n= l,...,iV- 1 

U°" ^ Coarse_Solver(U°"i, AT) 
endf or 

Now refine the solution until convergence: 

Unew , 

while (max„ ||U^,^" - U°"|| / ||U^^"|| > tol) 
pELrf or n — I, . . . , N — 1: 

TTold , Tjnew 

V„ ^ Fine_Solver (U°", At, AT) 

V„ V„ - Coarse_Solver(U°i'^, AT) 
end parfor 
for n = l,...,iV- 1 

U"''" ^ Coarse_Solver (U^,«_"i, AT) + V„_i 
endf or 
end while 



return U' 



new 

N 



4. The parallel-in-time algorithm without scale separation 

In geophysical fluid problems, it is often the case that the time scale separation 
can change in space and time, and it is important that this algorithm works even 
when there is no scale separation. We give a heauristic derivation of the time average 
(2.2 1 from a different point of view, which indicates that the coarse solution yields 
accuracy even when e = O (1), as long as the time average in (2.2) is performed 
over a time scale on which the dynamics of the slow nonlinear terms are occurring. 
Figure schematically depicts how the large time step AT varies as a function of 
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Figure 4. 1 . Schematic of AT as a function of e for (a) the asymp- 
totic parareal method (solid blue line), (b) the standard parareal 
method with a linearly exact coarse solver (dashed red line), and 
(c) a typical time-stepping method used in serial 



Asymptotic-Parareal 



AT 



Parareal- S trang 



y Direct Computation - Strang 



.01 



1 



Figure 4.2. Schematic depiction of the moving time average 




the scale separation parameter e, for the asymptotic parareal method, the standard 
parareal method, and a typical time-stepping method that is used in serial. 

As in the integrating factor method, we first factor out the fast oscillatory part, 

-*A^v(i), 



u(i) 



so that V (t) satisfies 

(4.1) 

Since 



dv 



-t/cC 



v(i) 



dw 

'dt 



Oil) 



V (t) varies more slowly than u (t) and thus time steps AT ^ e can potentially be 
used to solve for v (i). However, simply using a standard time-stepping scheme for 

V (i) will still require small step sizes. In fact, differentiating the equation (4.1) 
shows that v (t) has small but rapid fluctuations. 



and standard time-stepping schemes will not be accurate unless AT is small. 
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The idea (see also j3] and |30j) is to take a time step using a smoothed out 
version of the derivative; this is accomplished by using a moving time average, so 
that the small O (e) fluctuations in the derivative are removed; see Figure [4!2] for a 
schematic depiction. In particular, we average the derivative out over a time scale 
eTo on which the slow dynamics occur (that is, we average over Tq fast oscillations). 
Then using e.g. forward Euler we get the approximation 



where 




This approximation allows error control via two different mechanisms. First, when 
e <C 1 the above approximation is also an asymptotic approximation, and the time 
average serves to eliminate secular terms that arise from nonlinear resonances as 
long as AT ^ e. However, when e = O (1), then we can take eTo — AT , so that the 
time average is performed over a scale AT on which the dynamics is slow. In this 
case, we see that the above approximation is essentially the forward Euler method. 
In fact, in this case the derivatives of v (s) are slow on an O (AT) time scale, and 
we can Taylor expand to get 




= ^{0) + O{AT). 

For systems of ODEs, rigorous error bounds for this "partial time averaging" are 
derived in Section 3.2 of ^31j . 

5. Error and complexity bounds 

We first discuss the complexity of our algorithm. Although this analysis is 
standard for the parareal method, the complexity bounds demonstrate (in the- 
ory) arbitrarily large parallel speedup as the parameter e gets smaller. We assume 
that the time interval [0,1] is divided into N sub-intervals [T„_i,T„] of length 
AT = Tn — Tn-i = 1/N. We also assume that, within each subinterval, M time 
steps of size At are needed for the fine integrator, so that M = AT/ At. 

To obtain an initial guess for the parareal method, we first compute the slow 
approximations = (U°_i), n = 1, . . . , N , which takes a wall-clock time of 
TcN. Next, suppose that we are at a given iteration level k, and we need to compute 
the next iterations U^"*"^ from U^. To do so, we first need to compute, in parallel, 
the difference = (pAT (Ufj) — <^at i^n) between the coarse and fine solutions. 
This takes a wall-clock time of Tc + TfM. We then need to compute, in serial, 
the updated approximations Uj;+^ = <^at„_i (U^^i) + "^n-i; " = l,-.-,N — 
1. This takes a wall-clock time of Tf.N, for an overall cost of TfM + t^N + 
per iteration. Thus, after u iterations, the overall cost of the parareal method is 
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V {t[M + TcN + Tc) + TcN. In contrast, directly solving ( 1.1 1 in serial requires time 
TfNM. Thus, the estimated parallel speedup is 

TfNM . ( T{ M N 

< mm 



Notice that an upper bound on the speedup is proportional to M — AT /At, the 
ratio of slow to fine time step sizes. 

In order for the asymptotic parareal method to converge, the fine solver needs 
time steps At that are some fraction of e; therefore. At ^ e, AT ~ Me, and 
NM 1/At 1/e. By choosing N = M = y/TJe , we see that the parallel 
speedup is given by 

TfNM Tf /T 



ly {t{M + T^N) + T^N V (Tf + Te) + Te V e ■ 

Thus, the estimated parallel speedup increases as e decreases, which results in 
arbitrarily greater efficiency gains relative to standard numerical integrators. 

We now use the theory developed in |17| for our error bounds (the analysis in |S] 
would provide a more refined convergence analysis). We assume that the number 
N of large time steps AT taken satisfies N = e~^/^ (based on the above complex- 
ity analysis, this yields, in principle, optimal parallel speedup). In particular, we 
assume that the slow evolution operator i^^j^ satisfies 

(5.1) II^AT (ui) - Wat (u2)|| < (1 + CqAT) ||ui - U2II . 

We also assume that the difference between the slow and fine approximations sat- 
isfies 

(5.2) ^^j, (uo) - (yfAT (uo) = f (uo) e, 
where the operator f (■) satisfies the bounds 

(5.3) ||f (Ui)-f(u2)|| <Ci||ui-U2||, ||f (uo)ll <C2. 

Letting T„ — nAT, a slight modification of the proof of Theorem 1 in |T7] im- 
mediately yields the following result; we only provide the details that differ from 

Theorem 1. The error, u (T,i) — U^, after the kth parareal iteration is bounded by 



||u(T„)-U:;|| <6'=/2+iC2y|^e'^«(^"-^'=-^). 

Therefore, as long as the constants in the above error bound remain 0(1) (which 
is observed in practice), the error decreases by a factor of e^^'^ after each iteration. 

Proof. Following the proof of Theorem 1 in |17] , we obtain the bound 
||u(T„)-U^|| < eC^l^^il + CoATr-'-'n'^ 

< eC2^^i^e^«(^"-^'=+^) 

(fc + 1)! 

< k/2+ir, JXnf_ Co{T„-n+i) 
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where the second inequahty used that 1 + x < e^, and the third inequahty used 
that ne < Ne < e^/^ (recall the number N of big time steps is given by = e^^/^). 

□ 



6. Numerical examples 

6.1. Rotating Shallow Water Equations. We consider as a test problem the 
non-dimensional rotating shallow water equations (RSW) equations, 

dvi If T^-] /■:>c)f^\ c)"^! 

-V2 ' 



dt 



e 



j^-i/2dh\ dvi 4 

dv2 1 dv2 „4 

-KT + -vi +r - ■■"^ - 

at e 

dh dvi d 



(6.1) ^ + -'^'1 + ""1^— = ^^^x'"2, 

at e ox 



dt e dx dx 



with spatially periodic boundary conditions on the interval [0, 27r]. Here h{x,t) 
denotes the surface height of the fluid, and v\ {x, t) and V2 {x, t) denote the horizon- 
tal fluid velocities. The non-dimensional parameter e denotes the Rossby number 
(a ratio of the characteristic advection time to the rotation time), and is often 
small (e.g. 10~^) in realistic oceanic flows. The non-dimensional parameter F^^'^e 
gives the Froude number (a ratio of the characteristic fluid velocity to the gravity 
wave speed), where F = O (1) is a free parameter which we set to unity in our 
subsequent calculations. The scaling we have taken is for quasi-geostrophic dynam- 
ics (cf. IH]), which governs the fluid flow dominated by strong stratification and 
strong constant rotation. As is standard, we also introduce a hyperviscosity op- 
erator to prevent singularities from forming; using hyperviscosity also ensures 
that the low frequencies are less effected by dissipation than the high frequencies. 
The RSW equations represent a standard framework in which to develop and test 
new numerical algorithms for geophysical fluid applications. 



To relate equations (6.1) to the abstract formulation (1.1 1, we deflne 



vi {t,x) 

U(i,x) = I V2 {t,x) 

h {t, x) 

Then the system (6.1 1 can be written in the form ( |l.l| by setting 



-1 F-^^^dx \ / 1 \ / VI (vi)^ 

C = \ 1 , 2? = 1 , AA(u) = {V2), 

F-^/^dx / V 1 / \ ('^'^1)=. 



(1.2 



(1.2 



Because we work in a periodic domain, it is also convenient to consider (1.1 1 and 
in the Fourier domain. This will also make explicit the time-averaging in 
(periodicity is not required for our approach, and is only used to simplify the 



numerical scheme). A straightforward calculation (see [2F|) shows that 
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where is a vector that depends on the wavenumber k and a = —1, 0, 1. Therefore, 
by expanding the function u (t) in the basis of eigenfunctions for £, we have that 

1 



As shown in e.g. [37], the nonhnear term e'^'~N (e ^■''u (t)) in (1.1 ) can be written 
in the form 



(6.2) 



ikx 




The interaction coefficients C^'^^'f^^ are expHcitly given in e.g. [27J. From (6.2) 



it is clear that the time average defining M {u{t)) in (1.3) retains only three-wave 
resonances. Indeed, since 



lim ^ 

T^oo T 



ds 



we see that the time average is given by 



, ,a at 



Q2 
fc2 

02 _ Q 



(6.3) 



AA(u(t)) = ^, 



Akx 



k \ (fc.fci ,^2 iQ^^CKl .'^2)£5r 

Here the resonant set S^: is defined by 

S,: = {(fc, ki,k2,a, ai, 02) \ ki + k2 = k and — w)!^^ — aj^;* = 0} . 

The HMM (outlined in Section[2]) allows the resonant terms in (6.3 1 to be efficiently 
computed. 

6.2. Numerical experiments on the RSW equations. We solve equations 



(6.1 1 with the initial conditions consisting of vi (0, a;) = V2 (0, x) ~ and 
(6.4) h (x, 0) = ci (^e-4(---/2)^ sin (3 {x ~ 7r/2)) + e-2(— sin (8 {x - ^))) + 



Co, 



where ci and C2 are chosen so that 

/i (0, x) dx 



0, max |/i (x, 0)1 = 1. 



10-2, e 



We also choose the viscosity parameter 11 — 10^^, and the values e 
10^^, and e — 1 (corresponding to strong, weak, and no scale separation), which 
are physically realistic values in geophysical ocean flows; smaller values of e would 
yield even greater parallel speedup, but are less physically relevant. Although the 



choice of the initial height h{x,0) in (6.4) is somewhat arbitrary, the conclusions 
in this section appear to be insensitive to the initial conditions (as long as they are 
sufficiently smooth). For all three choices of e, we perform the following numerical 
experiments. First, we compute the solution using the asymptotic parareal method 
outlined in Section [3j We then compare the estimated parallel speedup with the 
results of using three numerical integrators in serial: exponential time differencing 
4th order Runge-Kutta (ETDRK4) [9], the operator integrating factor method (cf. 
[20j'). and Strang sphtting (see Algorithm [3] of Section |3|. Finally, for comparison 
we compute the solution using the standard parareal method by solving the full 
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equation (1.1 1 using big and small step sizes AT and At, where we use Strang 
splitting for the coarse solver (we find that using ETDRK4 or OIFS as a coarse 
solver yields similar parallel speedup). For the sake of simplicity, we also assume 
that the cost of computing a single time step using ETDRK4, OIFS, and Strang 
splitting is the same; a more careful analysis that takes into account the number of 
operations required for each integrator would yield greater parallel speedup. 

As we show below, with the coarse time step AT > 3/10, we find similar con- 
vergence and accuracy in the asymptotic parareal method for all values of e. The 
parallel speedup for e — 10^^ is about a factor of 100 relative to using ETDRK4, 
the integrating factor method, or Strang splitting in serial. In contrast, the stan- 
dard parareal method (using ETDKRK4, the integrating factor method, or Strang 
splitting as a coarse solver) requires much smaller time steps, resulting in a parallel 
speedup that is about 10 times smaller for e = 10~^. 

In our first numerical experiment, we take e = 10^^. For the asymptotic parareal 
method, we use a coarse time step AT = 50xe=l/2, a fine time step At = 
e/25 = 1/2500, and N = AT / At = 1250 time intervals [(n - 1) AT, nAT]. Even 
though the solution executes many temporal oscillations within the time intervals 



[nAT, (n -|- 1) AT] — see Figure 6.1 for a plot of the 4th Fourier coefficient h (fc, t) 
for < t < AT — the method converges in a small number of iterations. In fact, 
Figure |6.2| shows the maximum relative L°° error, 

max ||U^-u(nAT,-)|| / |lu (nAT, OIL , 

as a function of the iteration levels fc = 0, 1, . . . , 5. Comparing the parallel speedup 
relative to directly integrating the full equation ( |1.1[ ) using ETDRK4, the integrat- 
ing factor method, and Strang splitting, we find that we need step sizes At = e/25. 
At = e/20, and At = e/20, respectively, in order to obtain a comparable accuracy. 
Therefore, using the complexity analysis in Section [5] we expect a parallel speedup 
of 

N{AT/At) 
5{{AT/At) + N) + N ^ 

To constrast this speedup with that obtained using a standard version of the 
parareal method, we show in Figure |6.2| the results obtained via a coarse solver 



based on solving the full equation ( 1 . 1 1 using Strang splitting (the results are similar 
with using ETDRK4 or the integrating factor method as coarse solvers). In this 
experiment, we take At = e/25 = 1/2500, AT = 4e = 1/25, and TV = AT/A< = 
100, which yields an expected parallel speedup that is 10 times smaller. 

In our second experiment, we take e — 10~^, corresponding to weak scale separa- 
tion. In Figure [673] we show the maximum relative L°° error as a function of the iter- 
ation levels k = 0,1, ... ,5.. Here we used a coarse time step AT = 3e = 3/10, a fine 
time step At = e/25 = 1/250, and N = 150 time intervals [{n - 1) AT, nAT]. Com- 



paring the parallel speedup relative to directly integrating the full equation (1.1) 
using ETDRK4, integrating factor method, and Strang splitting, we find that we 
need time steps At = e/20 = 1/200, At = e/10 = 1/100, and At = e/20 = 1/200, 
respectively, in order to obtain a comparable accuracy. Thus, we obtain an esti- 
mated parallel speed of about 1/6 {AT/ At) = 13 relative to using Strang splitting 
in serial; taking into account the number of operations required for each time step 
in ETDRK4 and the integrating factor method, the estimated parallel speedup rel- 



ative to these integrators will be comparable. In contrast, plot (b) of Figure 6.3 
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(on alogj^o scale), 
„, as a function of 
used, and 



Figure 6.2. Maximum relative L°° error 
maxo<„< jv 1 1 U,^; - u (n AT, • ) | L / 1 1 u (n AT, • ) | 
the iteration level k; the initial condition (6.4) 
e = 10~^. The solid green line depicts the errors from the asymp- 
totic parareal method (Parareal-HMM) with a coarse time step 
AT = 50e, and the dashed red and dashed-dotted blue lines depict 
the errors from the standard parareal method using Strang split- 
ting (Parareal-Strang) with AT = 4e and AT = 5e, respectively. 
A fine time step At = e/25 is used for all three cases. 



™,f -u(iiAr)||^/||u(nAr)||^ as a function of k (logi„ scale); c = 10"' 



Parareal-HMM, Ar=50c 
Parareal, Ar=4c 
Parareal, Ar=5c 




shows the relative L°° errors, where this time the standard parareal algorithm is 
used with Strang splitting for the coarse and fine solvers, and using the same step 
sizes AT = e = 1/20 and At = e/20 = 1/200 (again, similar speedup is obtained 
with ETDRK4 and the integrating factor method). Here the estimated parallel 
speedup is 3 times smaller. 

In our final experiment, we take e = 1, corresponding to no scale separation. In 
Figure [6^ we show the relative L°° error for the asymptotic parareal method, where 
we use a coarse time step AT = 3/10, a fine time step At = 1/200, and = 60 time 
intervals [{n — 1) AT, nAT]. Comparing the parallel speedup relative to directly 
integrating the full equation (UTl, we obtain an estimated parallel speedup of about 
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(on alogj^Q scale), 
„, as a function of 
used, and 



Figure 6.3. Maximum relative L°° error 

maxo<„<7v 1 1 - u (riAT, •) L / 1 1 " ("^^' ' ) I 
the iteration level fc; the initial condition (6.4) 
e = 10~^. The solid green line depicts the errors from the asymp- 
totic parareal method (Parareal-HMM) with a coarse time step 
AT ~ 3e, and the dashed red and dashed-dotted blue lines depict 
the errors from the standard parareal method using Strang split- 
ting (Parareal-Strang) with AT = e and AT = 2e, respectively. A 
fine time step At — e/50 is used for all three cases. 



max||u,';-u(nAr)||„/||u(nAr)||^ as a function of k (login scale); c = 10"' 



Parareal-HMM, Ar=3c 
Parareal, Ar=c 
Parareal, Ar=2c 




Figure 6.4. Maximum relative error (on alogj^g scale), 

maxo<„<Ar ||Ufj — u (tiAT, •)||^ / ||u (rtAT, •)||j^, as a function of 
the iteration level k; the initial condition (6.4) is used, and e = 1. 



The solid green line depicts the errors from the asymptotic parareal 
method (Parareal-HMM) with a coarse time step AT — 3/lOe, and 
the dashed red and dashed-dotted blue lines depict the errors from 
the standard parareal method using Strang splitting (Parareal- 
Strang) with AT = 3/lOe and AT = 4/lOe, respectively. A fine 
time step Af = e/200 is used for all three cases. 



u(7iAr)||^ as a function of k (loguj scale); c = l 



Parareal-HMM, Ar=;VlOc 
Parareal, Ar=3/10c 
Parareal, Ar=4/10c 
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a factor of 5; the speedup with the standard parareal method (and using Strang 
spHtting for a coarse solver) is about the same. 

7. Summary 

In this paper we have introduced an asymptotic-parallel-in-time method for solv- 
ing highly oscillatory PDEs. The method is a modification of the parareal algorithm 
introduced by [24j. The modification replaces the coarse solver used in [24J by a 
numerically computed locally asymtotic solution based on the asymptotic mathe- 
matical structure of the equations ([32J, [27J, [28_, [3(i,) and concepts used in HMM. 

In addition to presenting the method we also include psuedo code. We discuss the 
performance of the method when e =1, which is important for using the method 
in realistic simulations where the time scale separation may vary in space and 
time. We also present a complexity analysis that shows that the parallel speed-up 
increases as e decreases which results in an arbitrarily greater efficiency gain relative 
to standard numerical integrators and a Theorem following [17^ that shows that as 
long as the constants remain bounded the error decreases by a factor of e^/^ after 
each iteration. 

We also present numerical experiments for the shallow water equations. These 
results demonstrate that the parallel speedup is more than 100 relative to expo- 
nential integrators such as ETDRK4 (for realistic parameter values in the shallow 
water equations); the speedup is also more than 10 relative to using the standard 
parareal method with a linearly exact coarse solver. Finally, the results demon- 
strate that the method works in the absence of scale separation, and with as much 
speedup as the standard parareal method. 
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